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Abstract 

We consider a network of devices, such as generators, fixed loads, deferrable loads, 
and storage devices, each with its own dynamic constraints and objective, connected by 
lossy capacitated lines. The problem is to minimize the total network objective subject 
to the device and line constraints, over a given time horizon. This is a large optimiza- 
tion problem, with variables for consumption or generation in each time period for 
each device. In this paper we develop a decentralized method for solving this problem. 
The method is iterative: At each step, each device exchanges simple messages with its 
neighbors in the network and then solves its own optimization problem, minimizing 
its own objective function, augmented by a term determined by the messages it has 
received. We show that this message passing method converges to a solution when the 
device objective and constraints are convex. The method is completely decentralized, 
and needs no global coordination other than synchronizing iterations; the problems to 
be solved by each device can typically be solved extremely efficiently and in parallel. 
The method is fast enough that even a serial implementation can solve substantial prob- 
lems in reasonable time frames. We report results for several numerical experiments, 
demonstrating the method's speed and scaling, including the solution of a problem 
instance with over 30 million variables in 52 minutes for a serial implementation; with 
decentralized computing, the solve time would be less than one second. 



*A11 authors are members of the Information Systems Laboratory, Department of Electrical Engineering, 
Stanford University 
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1 Introduction 



A traditional power grid is operated by solving a number of optimization problems. At the 
transmission level, these problems include unit commitment, economic dispatch, optimal 
power flow (OPF), and security-constrained OPF (SCOPF). At the distribution level, these 
problems include loss minimization and reactive power compensation. With the exception 
of the SCOPF, these optimization problems are static with a modest number of variables 
(often less than 10000), and are solved on time scales of 5 minutes or more. However, the 
operation of next generation electric grids {i.e., smart grid) will rely critically on solving large- 
scale, dynamic optimization problems involving hundreds of thousands of devices jointly 
optimizing tens to hundreds of millions of variables, on the order of seconds rather than 
minutes |ET1H ILVH12j . More precisely, the distribution level of a smart grid will include 
various types of active dynamic devices, such as distributed generators based on solar and 
wind, batteries, deferrable loads, curtailable loads, and electric vehicles, whose control and 
scheduling amount to a very complex power management problem |TSBCll| ICI12j . 

In this paper, we consider a general problem, which we call the optimal power scheduling 
problem (OPSP), in which a network of dynamic devices are connected by lossy capacitated 
lines, and the goal is to jointly minimize a network objective subject to local constraints on 
the devices and lines. The network objective is the sum of the objective functions of each 
device. These objective functions extend over a given time horizon and encode operating 
costs such as fuel consumption and constraints such as limits on power generation or con- 
sumption. In addition, the objective functions encode dynamic objectives and constraints 
such as limits on ramp rates or charging limits. The variables for each device consist of its 
consumption or generation in each time period and can also include local variables which 
represent internal states of the device over time, such as the state of charge of a storage 
device. 

When all device objective functions and line constraints are convex, the OPSP is a convex 
optimization problem, which can in principle be solved efficiently |B V04] . If not all device 
objective functions are convex, we can solve a relaxed form of the OPSP which can be used 
to find good, local solutions to the OPSP. The optimal value of the relaxed OPSP also 
gives a lower bound for the optimal value of the OPSP which can be used to evaluate the 
suboptimality of a local solution. 

For any network, the corresponding OPSP contains at least as many variables as the 
number of devices multiplied by the length of the time horizon. For large networks with 
hundreds of thousands of devices and a time horizon with tens or hundreds of time periods, 
the extremely large number of variables present in the corresponding OPSP makes solving it 
in a centralized fashion computationally impractical, even when all device objective functions 
are convex. 

We propose a decentralized optimization method which efficiently solves the OPSP by 
distributing computation across every device in the network. This method, which we call 
prox-average message passing, is iterative: At each iteration, every device passes simple 
messages to its network neighbors and then solves an optimization problem that minimizes 
the sum of its own objective function and a simple regularization term that only depends on 
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the messages it received from its network neighbors in the previous iteration. As a result, 
the only non-local coordination needed between devices for prox-average message passing 
is synchronizing iterations. When all device objective functions are convex, we show that 
prox-average message passing converges to a solution of the OPSP. 

Our algorithm can be used several ways. It can be implemented in a traditional way 
on a single computer or cluster of computers, by collecting all the device constraints and 
objectives. We will demonstrate this use with an implementation that runs on a single 8-core 
computer. A more interesting use is in a peer-to-peer architecture. In this architecture, each 
device contains its own processor, which carries out the required local dynamic optimization 
and exchanges messages with its neighbors on the network. In this setting, the devices do 
not need to divulge their objectives or constraints; they must only support a simple protocol 
for interacting with the neighbors. Our algorithm ensures that the network power flows will 
converge to their optimal values, even though each device has very little information about 
the rest of the network, and only exchanges limited messages with its immediate neighbors. 

Due to recent advances in convex optimization [WBlOl IMB12t IMWBll] , in many cases 
the optimization problems that each device solves in each iteration of prox-average message 
passing can be executed at millisecond or even microsecond time-scales on inexpensive, 
embedded processors. Since this execution can happen in parallel across all devices, the 
entire network can execute prox-average message passing at kilohertz rates. We present a 
series of numerical examples to illustrate this fact by using prox-average message passing to 
solve instances of the OPSP with up to 10 million variables serially in 17 minutes. (This is on 
an 8-core computer; with 64 cores, the time would be around 2 minutes.) Using decentralized 
computing, the solve time would be essentially independent of the size of the network and 
measured in fractions of a second. 

We note that although the primary application for our method is power management, it 
can easily be adapted to more general resource allocation and graph-structured optimization 
problems [RAWlOj . 

Related work. The use of optimization in power systems dates back to the 1920s and has 
traditionally concerned the optimal dispatch problem |Hap77| , which aims to find the lowest 
cost method for generating and delivering power to consumers, subject to physical generator 
constraints. With the advent of computer and communication networks, many different 
ways to numerically solve this problem have been proposed |Zhu09j and more sophisticated 
variants of optimal dispatch have been introduced, such as OPF, economic dispatch, and 
dynamic dispatch |CR90j . which extend optimal dispatch to include various reliability and 
dynamic constraints. For reviews of optimal and economic dispatch as well as general power 
systems, we direct the reader to [BV99j and the book and review papers cited above. 

When modeling AC power flow, the OPSP is a dynamic version of the OPF [Car62], 
extending the latter to include many more types of devices such as storage units. The 
OPSP also introduces a time horizon with coupling constraints between variables across 
time periods. The OPF has been a fundamental problem in power systems for over 50 years 
and is known to be non-convex. Recently, it was shown that the OPF can be solved exactly in 
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certain circumstances by recasting it as a semidefinite program and solving its dual problem 
[LLT2] . 

Distributed optimization methods are naturally applied to power networks given the 
graph-structured nature of the transmission and distribution networks. There is an exten- 
sive literature on distributed optimization methods, dating back to the early 1960s. The 
prototypical example is dual decomposition |DW60t IEve63] , which is based on solving the 
dual problem by a gradient method. In each iteration, all devices optimize their local (pri- 
mal) variables based on current prices (dual variables). Then the dual variables are updated 
to account for imbalances in supply and demand, with the goal being to determine prices 
for which supply equals demand. 

Unfortunately, dual decomposition methods are not robust, requiring many technical 
conditions, such as strict convexity and finiteness of all local cost functions, for both theo- 
retical and practical convergence. One way to loosen these technical conditions is to use an 
augmented Lagrangian |Hes69t iPowGQj [Ber82] . resulting in the method of multipliers. This 
subtle change allows the method of multipliers to converge under mild technical conditions, 
even when the local cost functions are not strictly convex or necessarily finite. However, this 
method has the disadvantage of no longer being separable across subsystems. To achieve 
both separability and robustness for distributed optimization, we can instead use the alter- 
nating direction method of multipliers (ADMM) |GM75l [GM761 IEck94l IBPC+Ilj . ADMM is 
very closely related to many other algorithms, and is identical to Douglas- Rachford operator 
splitting; see, e.g., the discussion in [BPC"*"!!! §3.5]. 



Building on the work of |LL12j . a distributed algorithm was recently proposed |LZT11] 
to solve the dual OPF using a standard dual decomposition on subsystems that are maximal 
cliques of the power network. Augmented Lagrangian methods (including ADMM) have 
previously been applied to the study of power systems with static, single period objective 
functions on a small number of distributed subsystems, each representing regional power 
generation and consumption |KB97j . For an overview of related decomposition methods 
applied to power flow problems, we direct the reader to |KB00l IBalOG] and the references 
therein. 



Outline. The rest of this paper is organized as follows. In ^|2]we give the formal definition 
of our network model. In §31 we give examples of how to model specific devices such as 
generators, deferrable loads and energy storage systems in our formal framework. In §U 
we describe the role that convexity plays in the OPSP and introduce the idea of convex 
relaxations as a tool to find solutions to the OPSP in the presence of non-convex device 
objective functions. In §3 we derive the prox-average message passing equations. In §S]we 
present a series of numerical examples, and in §7] we discuss how our framework can be easily 
extended to include use cases we do not explicitly cover in this paper. 



5 



2 Network model 



2.1 Formal definition 

A network consists of a finite set of terminals T. a finite set of devices V, and a finite set 
of nets M . The sets T) and M are both partitions of T. Thus, each terminal is associated 
with exactly one device and exactly one net. Equivalently, a network can be defined as a 
bipartite graph with one set of vertices given by devices, the other set of vertices given by 
nets, and edges given by terminals. 

Each terminal t G T has an associated power schedule pt — (pt(l) , . . . , pt{T)) G R^, 
where T is a given time horizon. Here Ptir) is the amount of energy consumed by device d 
through terminal t in time period r, where t ^ d {i.e., terminal t is associated with device 
d). When Pt(r) < 0, —pt{T) is the energy generated by device d through terminal t in time 
period r. The set of all terminal power schedules is denoted p. This is a function from T 
(the set of terminals) into (time periods); we can identify p with a |T| x T matrix, whose 
rows are the terminal power schedules. 

For each device d & V, pd consists of the set of power schedules of terminals associated 
with d, which we identify with a \d\ x T matrix whose rows are taken from the rows of 
p corresponding to the terminals in d. Each device d has an associated objective function 
fd : rI'^I^'^ — > R U {+cxd}, where we set fd{Pd) = oo to encode constraints on the power 
schedules for the device. When fdipd) < oo, we say that Pd is a set of realizable power 
schedules for device d, and we interpret fdipd) as the cost (or revenue, if negative) to device 
d for the power schedules Pd- 

Similarly, for each net n G J\f, Pn consists of the set of power schedules of terminals 
associated with n, which we identify with a \n\ x T matrix whose rows are taken from 
the rows of p corresponding to the terminals in n. Each net n is a lossless energy carrier 
(commonly referred to as a bus in power systems literature), which is represented by the 
constraint 

J2pt{r)^o, T = i,...,r. (1) 

ten 

In other words, in each time period the power flows on each net balance. 

For any terminal, we define the average net power imbalance p : T ^ R^, as 

P^ = ^\T.Pt', (2) 

I "I t'en 

where t G n, i.e., terminal t is associated with net n. In other words, Pt{T) is the average 
power schedule of all terminals associated with the same net as terminal t at time r. We 
overload this notation for devices by defining pd = {pi^ , • • • , Vi\d\)i where device d is associated 
with terminals ii, . . . ,i\d\- Using an identical notation for nets, we can see that p„ simply 
contains \n\ copies of the average net power imbalance for net n. The net power balance 
constraint for all terminals can be equivalently expressed as p = 0. 
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Optimal power scheduling problem. We say that a set of power schedules p : T 
is feasible if fdijPd) < oo for a\\ d E V (i.e., all devices' power schedules are realizable), 
and p = {i.e., power balance holds across all nets). We define the network objective as 
fip) = Y^dev fdipd)- The optimal power scheduling problem (OPSP) is 

minimize f{p) , . 

subject to p = 0, 

with variable p : T ^ R^. We refer to p as optimal if it solves ([3]), i.e., globally minimizes 
the objective among all feasible p. We refer to p as locally optimal if it is a locally optimal 
point for ([3]). 

Dual variables and locational marginal prices. Suppose p^ is a set of optimal power 
schedules, that also minimizes the Lagrangian 

teTT=i 

with no power balance constraint, where : T ^ R"^. In this case we call y'^ a set of 
optimal Lagrange multipliers or dual variables. When p° is a locally optimal point, which 
also locally minimizes the Lagrangian, then we refer to y^ as a set of locally optimal Lagrange 
multipliers. 

The dual variables are related to the traditional concept of locational marginal prices 
: T — )■ R^ by rescaling the dual variables associated with each terminal according to the 
size of its associated net, i.e., £° = y^/\n\, where t E n. This rescaling is due to the fact 
that locational marginal prices are the dual variables associated with the constraints in ([1]) 
rather than their scaled form used in ([3]) |EW02j . 

2.2 Discussion 

We now describe our model in a more intuitive, less formal manner. Devices include genera- 
tors, loads, energy storage systems, and other power sources, sinks, and converters. Terminals 
are ports on a device through which power flows, either into or out of the device (or both, at 
different times, as in a storage device). Nets are used to model ideal lossless uncapacitated 
connections between terminals over which power is transmitted; losses, capacities, and more 
general connection constraints between a set of terminals can be modeled with the addition 
of a device and individual nets which connect each terminal to the new device. Our net- 
work model does not specify nor require a specific type of energy transport mechanism {e.g., 
DC, single or 3-phase AC), but rather can abstractly model arbitrary heterogeneous energy 
transport and exchange mechanisms. 

The objective function of a device is used to measure the cost (which can be negative, 
representing revenue) associated with a particular mode of operation, such as a given level of 
consumption or generation of power. This cost can include the actual direct cost of operating 
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Figure 1: A simple network {left), and its transformation into standard form {right). 



according to the given power schedules, such as a fuel cost, as well as other costs such as CO2 
generation, or costs associated with increased maintenance and decreased system lifetime due 
to structural fatigue. The objective function can also include local variables other than power 
schedules, such as the state of charge of a storage device. 

Constraints on the power schedules and internal variables for a device are encoded by 
setting the objective function to +00 for power schedules that violate the constraints. In 
many cases, a device's objective function will only take on the values and +00, indicating 
no local preference among feasible power schedules. 

2.3 Example transformation to abstract network model 

We illustrate how a traditional power network can be recast into our network model in figure 
[H The original power network, shown on the left, contains 2 loads, 3 buses, 2 generators, 
and a single battery storage system. We can transform this small power grid into our model 
by representing it as a network with 11 terminals, 8 devices (3 of them transmission lines), 
and 3 nets, shown on the right of figure [H Terminals are shown as small filled circles. Single 
terminal devices, which are used to model loads, generators, and the battery, are shown as 
boxes. The transmission lines are two terminal devices represented by solid lines. The nets 
are shown as dashed rounded boxes. Terminals are associated with the device they touch 
and the net in which they are contained. 

The set of terminals can be partitioned by either the devices they are associated with, or 
the nets in which they are contained. Figure [2] shows the network in figure [1] as a bipartite 
graph, with devices on the left and nets on the right. 
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Figure 2: The network in figure [T] represented as a bipartite graph. Devices (boxes) 
are shown on the left with their associated terminals (dots). The terminals are con- 
nected to their corresponding nets (solid boxes) on the right. 

3 Device examples 

We present a few examples of how common devices can be modeled in our framework. We 
note that these examples are intentionally kept simple, but could easily be extended to more 
refined objectives and constraints. In these examples, it is easier to discuss operational costs 
and constraints for each device separately. A device's objective function is equal to the 
device's cost function unless any constraint is violated, in which case we set the objective 
value to +00. 

Generator. A generator is a single-terminal device with power schedule Pgen, which gen- 
erates power over a range, P™™ < —p^^^ < p™'^^^ and has ramp-rate constraints 



which limit the change of power levels from one period to the next. Here, the operator 



imax 



is the forward difference operator, defined as 



{Dx){t) = x{r + 1) - x(r), r = 1, . . . , T - 1. 



The cost function for a generator has the separable form 



T 




T = l 
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where : R — R gives the cost of operating the generator at a given power level over a 
single time period. This function is typically, but not always, convex and increasing. It could 
be piecewise linear, or, for example, quadratic: 

(j)geniu) = aU^ + (3u, 

where a, /3 > 0. 

More sophisticated models of generators allow for them to be switched on or off, with 
an associated cost each time they are turned on or off. When switched on, the generator 
operates as described above. When the generator is turned off, it generates no power but 
can still incur costs for other activities such as idling. 

Transmission line. A transmission line is a two terminal device that transmits power 
across some distance with power schedules pi and p2 and zero cost function. The sum 
Pi + P2 represents the loss in the line and is always nonnegative. The difference pi — p2 
can be interpreted as twice the power flow from terminal one to terminal two. A line has a 
maximum flow capacity, which is given by 

as well as a loss function, i{pi,P2) '■ R^^"^ — J- R+, which defines the constraint 

Pi +P2 + ^iPl,P2) = 0. 

In many cases, £ is a convex function with i{0, 0) = 0. Under a simple resistive model, i 
is a convex quadratic function of pi and p2- Under a model for AC power transmission, the 
feasible region defined by power loss is given by an ellipse jLTZ12j . 

Battery. A battery is a single terminal energy storage device with power schedule Phat, 
which can take in or deliver energy, depending on whether it is charging or discharging. The 
charging and discharging rates are limited by the constraints < p^^^^ < (^max^ where 

(-.max ^ j^T ^max ^ j^T 

are the maximum charging and discharging rates. At time t, 
the charge level of the battery is given by local variables 

g(r)=g-* + X:Pbat(t), r = l,...,T, 
t=i 

where g™'* is the initial charge. It has zero cost function and the charge level must not exceed 
the battery capacity, i.e., < g(r) < Q'^'^^, r = 1, . . . ,T. It is common to constrain the 
terminal battery charge q{T) to be some specified value or to match the initial charge 

More sophisticated battery models include (possibly state-dependent) charging and dis- 
charging inefficiencies as well as charge leakage. In addition, they can include costs which 
penalize excessive charge-discharge cycling. 
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Fixed load. A fixed energy load is a single terminal device with zero cost function which 
consists of a desired consumption profile, / G R^. This consumption profile must be satisfied 
in each period, i.e., we have the constraint pioad = ^■ 

Thermal load. A thermal load is a single terminal device with power schedule Ptherm which 
consists of a heat store (room, cooled water reservoir, refrigerator), with temperature profile 
9 e R^, which must be kept within minimum and maximum temperature limits, e R"^ 
and e R^. The temperature of the heat store evolves according to 

9{t + 1) = 9{t) + (/x/c)(r-'^(r) - 9{t)) - {v/c)p,^erAr), r = 1, . . . , T - 1, ^(1) = e^'*, 

where < pthcrm < 7f™ax ^j^g cooling power consumption profile, H^^^ E R^ is the 
maximum cooling power, /j, is the ambient conduction coefficient, t] is the heating/cooling 
efficiency, c is the heat capacity of the heat store, 9^^ e R^ is the ambient temperature 
profile, and is the initial temperature of the heat store. A thermal load has zero cost 
function. 

More sophisticated models include temperature-dependent cooling and heating efficien- 
cies for heat pumps, more complex dynamics of the system whose temperature is being 
controlled, and additional additive terms in the thermal dynamics, to represent occupancy 
or other heat sources. 

Deferrable load. A deferrable load is a single terminal device with zero cost function that 
must consume a minimum amount of power over a given interval of time, which is charac- 
terized by the constraint Y.T=AP^oaxii'T) > E, where E is the minimum total consumption for 
the time interval t — A, . . . , D. The energy consumption in each time period is constrained 
by < pioad < L™^. In some cases, the load can only be turned on or off in each time 
period, i.e., pioad(T) e {0, L™^} for r = A, . . . , D. 

Curtailable load. A curtailable load is a single terminal device which docs not impose 
hard constraints on its power requirements, but instead penalizes the shortfall between a 
desired load profile I e R^ and dehvered power. In the case of a hnear penalty, its cost 
function is given by 

Oi{l -pioad) + , 

where {z)+ = max(0, z), p\oad £ R-"^ is the amount of electricity delivered to the device, and 
q; > is a penalty parameter. 

Electric vehicle. An electric vehicle is a single terminal device with power schedule Pev 
which has a desired charging profile e R^ and can be charged within a time interval 
T — A, . . . , D. To avoid excessive charge cycling, we assume that the electric vehicle battery 
cannot be discharged back into the grid (in more sophisticated vehicle-to-grid models, this 
assumption is relaxed), so we have the constraints < Pev < C"^'^, where (7™*^ e r'^ 
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is the maximum charging rate. We assume that c'^'^^{t) = for r ^ {A, . . . ,D}, and for 
T = A, . . . , D, the charge level is given by 

g(r) = r^* + EPev(t), 

t=A 

where is the initial charge of the vehicle when it is plugged in at time t = A. 

We can model electric vehicle charging as a deferrable load, where we require a given 
charge level to be achieved at some time. A more realistic model is as a combination of a 
deferrable and curtailable load, with cost function 

af:(c<^-(r)-g(r))+, 

T=A 

where a > is a penalty parameter. Here c'^'^'^{t) is the desired charge level at time r, and 
d^^^i^r) — ^(t))^ is the shortfall. 

External tie with transaction cost. An external tie is a connection to an external 
source of power. We represent this as a single terminal device with power schedule Pex- In 
this case, Pexi^)- = niax{— Pcx(t), 0} is the amount of energy pulled from the source, and 
Pex(''")+ = niax{pex(T), 0} is the amount of energy delivered to the source, at time r. We 
have the constraint |pex(T)| < -E'™^^(r), where E"™^^ g R"^ is the transaction limit. 

We suppose that the prices for buying and selling energy are given by c ± 7 respectively, 
where c(r) is the midpoint price, and 7(r) > is the difference between the price for buying 
and selling [i.e., the transaction cost). The cost function is then 

-(C - 7)^(Pex)+ + (C + 7)^(Pex)- = "c'^Pcx + 7^bex|, 

where \pex\, {.Pex)+, and (pex)- are all interpreted elementwise. 

4 Convexity 
4.1 Devices 

We call a device convex if its objective function is a convex function. A network is convex 
if all of its devices are convex. For convex networks, the OPSP is a convex optimization 
problem, which means that in principle we can efficiently find a global solution |B V04j . 
When the network is not convex, even finding a feasible solution for the OPSP can become 
difficult, and finding and certifying a globally optimal solution to the OPSP is generally 
intractable. However, special structure in many practical power distribution problems allows 
us to guarantee optimality in certain cases. 

In the examples from ^ the battery, fixed load, thermal load, curtailable load, electric 
vehicle, and external tie are all convex devices using the constraints and objective functions 
given. A deferrable load is convex if we drop the constraint that it can only be turned on 
or off. We discuss the convexity properties of the generator and transmission line in the 
following section. 
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Figure 3: Left: Cost function for a generator that can be turned off. Right: Its 
convex relaxation. 

4.2 Relaxations 

One technique to deal with non-convex networks is to use convex relaxations. We use the 
notation g^"^"^ to denote the convex envelope [Roc 70] of the function g. There are many 
equivalent definitions for the convex envelope, for example, g^""^ = (g*)*, where g* denotes 
the convex conjugate of the function g. We can equivalently define g'^'^^ to be the largest 
convex lower bound of g. If (7 is a convex, closed, proper (CCP) function, then g = g^'^^. 
We define the relaxed optimal power scheduling problem (rOPSP) as 

minimize f^^^{p) /^\ 
subject to p = 0, 

with variable p : T ^ R"^. This is a convex optimization problem, whose optimal value 
can in principle be computed efficiently, and whose optimal objective value is a lower bound 
for the optimal objective value of the OPSP. In some cases, we can guarantee a priori that 
a solution to the rOPSP will also be a solution to the OPSP based on a property of the 
network objective such as monotonicity. Even when the relaxed solution does not satisfy 
all of the constraints in the unrelaxed problem, it can be used as a starting point to help 
construct good, local solutions to the unrelaxed problem. The suboptimality of these local 
solutions can then be bounded by the gap between their network objective and the lower 
bound provided by the solution to the rOPSP. If this gap is small for a given local solution, 
we can guarantee that it is nearly optimal. 

Generator. When a generator is modeled as in ^and is always powered on, it is a convex 
device. However, when given the ability to be powered on and off, the generator is no longer 
convex. In this case, we can relax the generator objective function so that its cost for power 
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Figure 4: Left: Feasible sets of a transmission lines with no loss (black) and AC loss 
(grey). Right: Their convex relaxations. 

production in each time period, given in figure [3l is a convex function. This allows the 
generator to produce power in the interval [0, p™™]. 

Transmission line. In a lossless transmission line, we have i{pi,P2) = 0, and thus the set 
of feasible power schedules is the line segment 

L = {{pi,P2) I Pi = -P2, P2 e [-C°-72, 

as shown in figure H] in black. When the transmission line has losses, in most cases the loss 
function £ is a convex function of the input and output powers, which leads to a feasible 
region like the grey arc in figure HI For example, using a lumped 11 model and under the 
common assumption that the voltage magnitude is fixed |B V99] . a transmission line with 
series admittance g — ib gives the quadratic loss 

i{pi,P2) = -i9/mPi+P2)Vg' + iPi-P2)Vb'). (5) 

The feasible set of a relaxed transmission line is given by the convex hull of the original 
transmission line's constraints. The right side of figure H] shows examples of this for both 
lossless and lossy transmission lines. Physically, this relaxation gives lossy transmission lines 
the ability to discard some additional power beyond what is simply lost to heat. Since 
electricity is generally a valuable commodity in power networks, the transmission lines will 
generally not throw away any additional power in the optimal solution to the rOPSP, leading 
to the power line constraints in the rOPSP being tight and thus also satisfying the unrelaxed 
power line constraints in the original OPSP. As was shown in |LTZ12] . when the network is 
a tree, this relaxation is always tight. In addition, when all locational marginal prices are 
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positive and no other non-convexities exist in the network, the tightness of the hne constraints 
in the rOPSP can be guaranteed in the case of networks that have separate phase shifters 
on each loop in the networks whose shift parameter can be freely chosen |SL12j . 



5 Decentralized method 

We begin this section by deriving the prox-average message passing equations assuming 
that all the device objective functions are convex closed proper (CCP) functions. We then 
compare the computational and communication requirements of prox-average message pass- 
ing with a centralized solver for the OPSP. The additional requirements that the functions 
are closed and proper are technical conditions that are in practice satisfied by any convex 
function used to model devices. We note that we do not require either finiteness or strict con- 
vexity of any device objective function, and that all results apply to networks with arbitrary 
topology. 

Whenever we have a set of variables that maps terminals to time periods, x : T — )■ 
(which we can also associate with a |T| x T matrix), we will use the same index and over-line 
notation for the variables x as we do for power schedules p. For example, Xt G R"^ consists 
of the time period vector of values of x associated with terminal t and Xt = (l/|n|) J^t'Gn^f, 
where t & n, with similar notation for indexing x by devices and nets. 



5.1 Prox-average message passing 

We derive the prox-average message passing equations by reformulating the OPSP using 
the alternating direction method of multipliers (ADMM) and then simplifying the resulting 
equations. We refer the reader to [BPC"*"!!] for a thorough overview of ADMM. 
We first rewrite the OPSP as 

minimize J2dev fd{Pd) + En^M 9n{zn) 

subject to p = z, ^ ' 

with variables p, z : T ^ R"^, where gn{zn) is the indicator function on the set {z^ '■ z^ = 
0}. We use the notation from [BPC"*"!!] and, ignoring a constant, form the augmented 
Lagrangian 

Lp{p,z,u) = ^ fdiPd) + 9n{zn) + {p/2)\\p-z + u\\l, (7) 

deV neAf 

with the scaled dual variables u = y/ p : T ^ H^, which we associate with a |T| x T matrix. 
Because devices and nets are each partitions of the terminals, the last term of can be 
split across either devices or nets, i.e., 

{p/'^)\\P - Z + U\\l = J2{p/2)\\pd- Zd + Ud\\l = J2 {p/2)\\pr, - Z^ + UnWl- 
deV n£Af 
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:= arj 
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The resulting ADMM algorithm is then given by the iterations 

{fM + ip/2)\\p,-z', + 4\\l), dev, 

(g{zn) + {p/2)\\zn - ul - p^+i^) , n e AT, 

where the first step is carried out in parallel by all devices, and then the second and third 
steps are carried out in parallel by all nets. 

Since gnizn) is simply an indicator function for each net n, the second step of the algorithm 
can be computed analytically and is given by 

^k-\-l — k I /c+l —k — /c+l 

By plugging this quantity into the u update step, the algorithm can be simplified further to 
yield the prox-average message passing algorithm: 

1. Proximal power schedule update. 

pr':=Prox^,>d-Pd-«d), dev. (8) 

2. Scaled price update. 



u„ := u 



p7\ neN. (9) 



The proximal function for a function / is given by 

proxy (x) = argmin (f{y) + {p/2)\\x - yUa) , (10) 
y ^ ' 

which is guaranteed to exist when / is CCP |Roc70j . 

We can now see where the name prox-average message passing comes from. In each 
iteration, every device computes the proximal function of its objective function, with an 
argument that depends on messages passed to it through its terminals by its neighboring 
nets in the previous iteration {p\ and m^). Then, every devices passes to its terminals the 
newly computed power schedules, p^^^, which are then passed to the terminals' associated 
nets. Every net computes its new average power imbalance, p^"*"^, updates its dual variables, 
M^"*"^, and broadcasts these values to its associated terminals' devices. Since p\ is simply |n| 
copies of the same vector for all fc, we can see that all terminals connected to the same net 
have the same value for their dual variables throughout the algorithm, i.e., for all values of 
/c, u\ = u^i whenever t,t' E n for any n G Af. 

As an example, consider the network represented by figures [1] and [2j The prox-average 
algorithm performs the power schedule update on the devices (the boxes on the left in figure 
E]). The devices share the respective power profiles via the terminals, and the nets (the solid 
boxes on the right) compute the scaled price update. For any network, the prox-average 
algorithm can be thought of as alternating between devices (on the left) and nets (on the 
right). 
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Convergence. We make a few comments about the convergence of prox-average message 
passing. Since prox-average message passing is a version of ADMM, all convergence results 
that apply to ADMM also apply to prox-average message passing. In particular, when all 
devices have closed, convex, proper (CCP) objective functions and a feasible solution to the 
OPSP exists, the following hold. 

1. Residual convergence, as k ^ oo, 

2. Objective convergence. Y.dev fdiPd) + Y.neAf 9niPn) T as k oo, 

3. Dual variable convergence, pu^ = — )■ as A; — t- oo, 

where /* is the optimal value for the OPSP, and y* are the optimal dual variables. The proof 
of these conditions can be found in |BPC"'"llj . As a result of the third condition, the optimal 
locational marginal prices £* can be found for each net G A/" by setting £* = y^/\n\. 

Stopping criterion. Following [BPC^ll] , we can define primal and dual residuals, which 
for prox-average message passing simplify to 

= p^, = p{{p^ - p^) - {p^^^ - p^'^)). 

We give a simple interpretation of each residual. The primal residual is simply the net 
power imbalance across all nets in the network, which is the original measure of primal 
feasibility in the OPSP. The dual residual is equal to the difference between the current and 
previous iterations of the difference between power schedules and their average net power. 
The locational marginal price at each net is determined by the deviation of all associated 
terminals' power schedule from the average power on that net. As the change in these 
deviations approaches zero, the corresponding locational marginal prices converge to their 
optimal values. 

We can define a simple criterion for terminating prox-average message passing when 

IF l|2 ^ e ) |P l|2 ^ e ; 

where e^"^' and e*^"^' are, respectively, primal and dual tolerances. We can normalize both of 
these quantities to network size by the relation 



pri dual abs 




for some absolute tolerance e > 0. 

Choosing a value of p. Numerous examples show that the value of p can have a dramatic 
effect on the rate of convergence of ADMM and prox-average message passing. Many good 
methods for picking p in both offline and online fashions are discussed in |BPC+llj . We 
note that unlike other versions of ADMM, the scaling parameter p enters very simply into 
the prox-average equations and can thus be modified online without incurring any additional 
computational penalties, such as having to re-factorize a matrix. 

We can modify the prox-average message passing algorithm with the addition of a third 
step 
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3. Parameter update and price rescaling. 



P 



,k+l 



u 



k+l 




for some function h. We desire to pick an h such that the primal and dual residuals are of 
similar size throughout the algorithm, i.e., p'^||r'^||2 ~ 115*^112 for all k. To accomplish this 
task, we use a simple proportional-derivative controller to update p, choosing h to be 



where = P^\\t^\\2/\\s'^\\2 — 1 and A and fi are nonnegative parameters chosen to control 
the rate of convergence. Typical values of A and p are between 10~^ and 10~^. 

When p is updated in such a manner, convergence is sped up in many examples, some- 
times dramatically. Although it can be difficult to prove convergence of the resulting algo- 
rithm, a standard trick is to assume that p is changed only for a large but bounded number 
of iterations, at which point it is held constant for the remainder of the algorithm, thus 
guaranteeing convergence. 

Non-convex case. When one or more of the device objective functions is non-convex, we 
can no longer guarantee that prox-average message passing converges to the optimal value 
of the OPSP or even that it converges at all {i.e., reaches a fixed point). Prox functions 
for non-convex devices must be carefully defined as the set of minimizers in ( [T0|) is no 
longer necessarily a singleton. Even when they can be defined, prox functions of non-convex 
functions are intractable to compute in many cases. 

One solution to these issues is to use prox-average message passing to solve the rOPSP. It 
is easy to show that f'^'^^ip) = Y^a&v fT^ iPd)- As a result, we can run prox-average message 
passing using the proximal functions of the relaxed device objective functions. Since /J"^^ 
is a CCP function for all d & V, prox-average message passing in this case is guaranteed 
to converge to the optimal value of the rOPSP and yield the optimal relaxed locational 
marginal prices. 

5.2 Discussion 

In order to compute the prox-average messages, devices and nets only require knowledge 
of who their network neighbors are, the ability to send small vectors of numbers to those 
neighbors in each iteration, and the ability to store small amounts of state information 
and efficiently compute prox functions (devices) or averages (nets). As all communication 
is local and peer-to-peer, prox-average message passing supports the ad hoc formation of 
power networks, such as micro grids, and is robust to device failure and unexpected network 
topology changes. 

Due to recent advances in convex optimization |WB10t IMB12t IMWBll] , many of the prox 
function calculations that devices must perform can be very efficiently executed at millisecond 



h{p'') = p^ exp(At;'' p{v^ - ^)), 
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or microsecond time-scales on inexpensive, embedded processors. Since all devices and all 
nets can each perform their computations in parallel, the time to execute a single, network 
wide prox-average message passing iteration (ignoring communication overhead) is equal to 
the sum of the maximum computation time over all devices and the maximum computation 
time of all nets in the network. As a result, the computation time per iteration is small and 
essentially independent of the size of the network. 

In contrast, solving the OPSP in a centralized fashion requires complete knowledge of the 
network topology, sufficient communication bandwidth to centrally aggregate all devices ob- 
jective function data, and sufficient centralized computational resources to solve the resulting 
OPSP. In large, real-world networks, such as the smart grid, all three of these requirements 
are generally unattainable. Having accurate and timely information on the global connec- 
tivity of all devices is infeasible for all but the smallest of dynamic networks. Centrally 
aggregating all device objective functions would require not only infeasible bandwidth and 
data storage requirements at the aggregation site, but also the willingness of all devices to 
expose what could be proprietary function parameters in their objective functions. Finally, 
a centrahzed solution to the OPSP requires solving an optimization problem with Q{\T\T) 
variables, which leads to an identical lower bound on the time scaling for a centralized solver, 
even if problem structure is exploited. As a result, the centralized solver cannot scale to solve 
the OPSP on very large networks. 

6 Numerical examples 

We illustrate the speed and scaling of prox-average message passing with a range of numerical 
examples. In the first two sections, we describe how we generate network instances for our 
examples. We then describe our implementation, showing how multithreading can exploit 
problem parallelism and how our method would scale in a fully peer-to-peer implementation. 
Lastly, we present our results, and demonstrate how the number of prox-average iterations 
needed for convergence is essentially independent of network size and also significantly de- 
creases when the algorithm is seeded with a reasonable warm-start. 

6.1 Network topology 

We generate a network instance by first picking the number of nets N. We generate the 
locations Xi e R^, i = 1, . . . , N hy drawing them uniformly at random from [0, y/N^. (These 
locations will be used to determine network topology.) Next, we introduce transmission lines 
into the network as follows. We first connect a transmission line between all pairs of nets i 
and j independently and with probability 

7(i,j) = amm{l,£/\\xi - Xj\\l). 

In this way, when the distance between i and j is smaller than d, they are connected with 
a fixed probability a > 0, and when they are located farther than distance d apart, the 
probability decays as — After this process, we add a transmission line between 
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Figure 5: A sample random network. Devices are color-coded: generators are in 
green, batteries are in blue, and loads are in red. Edges represent transmission lines. 

any isolated net and its nearest neighbor. We then introduce transmission lines between 
distinct connected components by selecting two connected components uniformly at random 
and then selecting two nets, one inside each component, uniformly at random and connecting 
them by a transmission line. We continue this process until the network is connected. 

For the network instances we present, we chose parameter values d = 0.15 and a = 0.8 as 
the parameters for generating our network. This results in networks with an average degree 
of 2.3. Using these parameters, we generated networks with 100 to 100000 nets, which 
resulted in optimization problems with approximately 30 thousand to 30 million variables. 

6.2 Devices 

After we generate the network topology described above, we randomly attach a single (one- 
terminal) device to each net according to the distribution in table [TJ The models for each 
device in the network are identical to the ones given in section [3l with model parameters 
chosen in a manner we describe below. Figure |5] shows an example network for = 100 (30 
thousand variables) generated in this fashion. 

For simplicity, our examples only include networks with the devices listed below. For all 
devices, the time horizon was chosen to be T = 96, indicating 15 minute intervals for a 24 
hour power schedule, with the time period r = 1 corresponding to midnight. 

Generator. Generators have the quadratic cost functions given in ^and are divided into 
three types: small, medium, and large. In each case, we allow the generator to be turned on 
and off by setting Pmin = 0. Small generators have the smallest maximum power output, but 
the largest ramp rates, while large generators have the largest maximum power output, but 
the slowest ramp rates. Medium generators lie in between. Large generators are generally 
more efficient than small and medium generators which is reflected in their cost function by 
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Device 


Fraction 


Generator 


0.2 


Battery 


0.1 


Fixed load 


0.5 


Deferrable load 


0.1 


Curtailable load 


0.1 



Table 1: Fraction of one-terminal devices present in the generated networks. 





^min 


pma,x 




a 


/3 


Large 





50 


3 


0.001 


0.1 


Medium 





20 


5 


0.005 


0.2 


Small 





10 


10 


0.02 


1 



Table 2: Generator parameters. 



having smaller values of a and /3. Whenever a generator is placed into a network, its type 
is selected uniformly at random, and its parameters are taken from the appropriate row in 
table 

Battery. For a given instance of a battery, its parameters are generated by setting g'"* = 
and selecting Q™*^^ uniformly at random from the interval [20,50]. The charging and 
discharging rates are selected to be equal {i.e., C*™*" = D"^^^^ and drawn uniformly at 
random from the interval [5, 10]. 

Fixed load. The load profile for a fixed load instance is a sinusoid, 

/(r) = c + asin(27r(r - (f)o)/T), r = 1, . . . ,T, 

with the amplitude a chosen uniformly at random from the interval [1, 5], and the DC term 
c chosen so that c = a + u, where u is chosen uniformly at random from the interval [0, 0.5], 
which ensures that the load profile remains elementwise positive. The phase shift 0o is chosen 
uniformly at random from the interval [60, 72], ensuring that the load profile peaks between 
the hours of 3pm and 6pm. 

Deferrable load. For an instance of a deferrable load, we choose E uniformly at random 
from the interval [500, 1000]. The start time index A is chosen uniformly at random from 
the discrete set {1, . . . , T — 7}. The end time index D is then chosen uniformly at random 
over the set {A + 7, . . . , T}, so that the minimum time window to satisfy the load is 8 time 
periods (2 hours). We set the maximum power so that it is possible to satisfy the total energy 
constraint by only operating in half of the available time periods, i.e., L™^^ = 2E/ [D — A). 
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Curtailable loads. For an instance of a curtailable load, the desired load / is constant 
over all time periods with a magnitude chosen uniformly at random from the interval [5, 15]. 
The penalty parameter a is chosen uniformly at random from the interval [1, 2]. 

Transmission line. For an instance of a transmission line, we choose its parameters by 
first solving the OPSP with lossless, uncapacitated lines, where we add a small quadratic 
cost function t{j)\ + ^2); with e = 10~^, to each transmission line in order to help spread 
power flow throughout the network. Using the flow values given by the solution to that 
problem, we set C*™^^ = max(10, 4F™'^^) for each line, where F™*^^ is equal to the maximum 
flow (from the solution to the lossless problem) along that line over all time periods. We 
use the loss function for transmission lines with a series admittance g — ib given by (JSj). We 
choose g and b such that b = '-/g, where 7 is chosen uniformly at random from the interval 
[4.5,5.5]; on average, the susceptance is 5 times larger than the conductance. After we pick 
^max -j-j-^g values of g and b are chosen such that the loss when transmitting power at 

maximum capacity is uniformly at random between 5 to 15 percent of C^^^. 

6.3 Serial multithreaded implementation 

Our OPSP solver is implemented in C++, with the core prox-average message passing equa- 
tions occupying fewer than 25 lines of C++ (excluding problem setup and classes). The code 
is compiled with gcc 4.4.5 on an 8-core, 3.4GHz Intel Xeon processor with 16GB of RAM 
running the Debian OS. We used the compiler option -03 to leverage full code optimization. 

To approximate a fully distributed implementation, we use gee's implementation of 
OpenMP (version 3.0) and multithreading to parallelize the computation of the prox func- 
tions for the devices. We use 8 threads (one per core) to solve each example. Assuming 
perfect load balancing, this means that 8 prox functions are being evaluated in parallel. 
Effectively, we evaluate the prox functions by stepping serially through the devices in blocks 
of size 8. We do not, however, parallelize the computation of the dual update over the nets 
since the overhead of spawning threads dominates the vector operation itself. 

The prox functions for fixed loads and curtailable loads are separable over r and can 
be computed analytically. For more complex devices, such as a generator, battery, or de- 
ferrable load, we compute the prox function using CVXGEN |MB12j . The prox function for 
a transmission line is computed by projecting onto the convex hull of the line constraints. 

For a given network, we solve the associated OPSP with an absolute tolerance e^^*^ = 10~^. 
This translates to three digits of accuracy in the solution. The CVXGEN solvers used to 
evaluate the prox operators for some devices have an absolute tolerance of 10~^. For our 
p-update function, /i, we use the parameter values A = 0.005 and /i = 0.01 and clip our 
values of p to be between e^*^*^ and Xje'^^^ to prevent roundoff error. 

6.4 Peer-to-peer implementation 

We have not yet created a fully peer-to-peer, bulk synchronous parallel |Val90l |MAB"'"10] 
implementation of prox-average message passing, but have carefully tracked prox-average 
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solve times in our serial implementation in order to facilitate a first order analysis of such 
a system. In a peer-to-peer implementation, the proximal power schedule updates occur in 
parallel across all devices followed by (scaled) price updates occurring in parallel across all 
nets. As previously mentioned, the computation time per prox-average iteration is thus the 
maximum time, over all devices, to evaluate the proximal function of their objective, added 
to the maximum time across all nets to average their terminal power schedules and add that 
quantity to their existing price vector. Since evaluating the prox function for some devices 
requires solving a convex optimization problem, whereas the price update only requires a 
small number of vector operations that can be performed as a handful of SIMD instructions, 
the compute time for the price update is negligible in comparison to the proximal power 
schedule update. The determining factor in solve time, then, is in evaluating the prox 
functions for the power schedule update. In our examples, the maximum time taken to 
evaluate any prox function is 1 ms. To solve a problem with N = 100000 nets (30 million 
variables) with approximately 500 iterations of our prox-average algorithm then takes only 
500 ms. 

In practice, the actual solve time would clearly be dominated by network communication 
latencies and actual runtime performance will be determined by how quickly and reliably 
packets can be delivered. As a result, in a true peer-to-peer implementation, a negligible 
amount of time is actually spent on computation. However, it goes without saying that many 
other issues must be addressed with a peer-to-peer protocol, including handling network 
delays and security. 

6.5 Results 

We first consider a single example: a network instance with N = 3000 (1 million variables). 
Figure [6] shows that after fewer than 500 iterations of prox-average message passing, both 
the relative suboptimality and the average net power imbalance are both less than 10~^. 
The convergence rates for other network instances over the range of sizes we simulated are 
similar. 

In figure [TJ we present average timing results for solving the OPSP for a family of exam- 
ples, with networks of size N = 100, 300, 1000, 3000, 10000, 30000, and 100000. For each 
network size, we generated and solved 20 network instances to generate average solve times 
and confidence intervals around those averages. For network instances with = 100000 
nets, the problem has approximately 30 million variables, which we solve serially using prox- 
average message passing in 52 minutes on average. 

For a peer-to-peer implementation, the runtime of prox-average message passing should 
be essentially constant, and in particular independent of the size of the network. For our 
multithreaded implementation, with bounded computation, this would be reflected by a 
runtime that only increases linearly with the number of nets in a network instance. By fitting 
a line to figure [TJ we find that our parallel implementation scales as 0(A^'^'^^^). The slight 
discrepancy between this and the ideal exponent of 1 is accounted for by implementation 
details such as operating system background processes consuming some compute cycles and 
slightly imperfect load balancing across all 8 cores in our system, especially for smaller values 
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Figure 6: The relative suboptimality {left) and primal infeasibility {right) of prox- 
average message passing on a network instance with = 3000 nets (1 million vari- 
ables). The dashed line shows when the stopping criterion is satisfied. 

of N. 

We note that figure [7] shows cold start runtimes for solving the OPSP. If we have access 
to a good estimate of the power schedules and locational marginal prices for each terminal, 
we can use these estimates to warm start our OPSP solver. To show the effect of warm- 
starting, we solve a specific problem instance with N = 3000 nets (1 million variables). We 
define K'^°^'^ to be the number of iterations needed to solve an instance of this problem. We 
then uniformly scale the load profiles of each device by separate and independent lognormal 
random variables. The new load profiles, /, are obtained from the original load profiles / 
according to 

/ = / exp{aX), 

where X ~ A/'(0, 1), and a > is given. Using the solution of the original problem to warm 
start our solver, we solve the perturbed problem and report the number of iterations K""^™ 
needed to solve it. Figure |H] shows the ratio /('wami^^coid vary a, and indicates the 

significant computational savings that warm-starting can achieve, even under relatively large 
perturbations. 

7 Extensions 

Receding horizon control. The speed with which prox-average message passing con- 
verges on very large networks shows its applicability in coordinating real time decisions 
across massive networks of devices. A direct extension of our work to real-time network 
operation can be achieved using receding horizon control (RHC) |Mac02l iBemOGj . In RHC, 
we solve the OPSP at each time step to determine conditional consumption and generation 
profiles for every device over the next T time periods. We then execute the first step of 
these profiles and resolve the OPSP using new information and measurements that have 



24 



10000 



1000 



02 

Pi 
o 

100 



10 



.1" 



. A 



100 



1000 



N 



10000 



100000 



Figure 7: Average prox-average execution times for a family of networks on 8 cores. 
Error bars show 95% confidence bounds. The dotted line shows the least-squares fit 
to the data on a log-log scale, resulting in an exponent of 0.923. 
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Figure 8: Number of warm start iterations needed to converge for various perturba- 
tions of load profiles. 
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become available. RHC has been successfully applied to a wide range of areas, including 
power systems, and allows us to take advantage of warm starting our algorithm, which we 
have shown to significantly decrease the number of iterations needed for convergence. 

Hierarchical models. The power gird has a natural hierarchy, with generation and trans- 
mission occurring at the highest level and residential consumption and distribution occurring 
at the most granular. Prox-average message passing can be easily extended into hierarchi- 
cal interactions by scheduling messages on different time scales and between systems at the 
similar levels of the hierarchy |CLCD07] . 

We can recursively apply prox-average message passing at each level of the hierarchy. 
At the highest level, all regional systems exchange their proximal updates once they have 
computed their own prox-function. It can be shown that computing this function for a 
given region is equivalent to computing a partial minimization over the sum of the objective 
functions of devices located inside that region, subject to intra-region power balance. This 
too can be computed using prox-average message passing. This process can be continued 
down to the individual device level, at which point the device must solve its own prox function 
directly as the base case. 

Local stopping criteria and p updates. The stopping criterion and the algorithm we 
propose to update p in [|5]both currently require global device coordination — specifically 
the global values of the primal and dual residuals at each iteration. In principle, these could 
be computed in a decentralized fashion across the network by gossip algorithms |Sha08] . 
but that would require many rounds of gossip in between each iteration of prox-average 
message passing, significantly increasing runtime. We are currently investigating methods 
by which both the stopping criterion and different values of p can be independently chosen 
by individual devices or even individual terminals, all based only on local information, such 
as the primal and dual residuals of a given device and its neighboring nets. 

8 Conclusion 

We have presented a fully decentralized method for dynamic network energy management 
based on message passing between devices. Prox-average message passing is simple and 
highly extensible, relying solely on peer to peer communication between devices that ex- 
change energy. When the resulting network optimization problem is convex, prox-average 
message passing converges to the optimal value and gives optimal locational marginal prices. 
We have presented a parallel implementation that shows the time per iteration and the 
number of iterations needed for convergence of prox-average message passing are essentially 
independent of the size of the network. As a result, prox-average message passing can scale 
to extremely large networks with almost no increase in solve time. 
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